Optimization Techniques for "Scaling Down" Hadoop on Multi-Core, Shared-Memory Systems

نویسندگان

  • K. Ashwin Kumar
  • Jonathan Gluck
  • Amol Deshpande
  • Jimmy J. Lin
چکیده

The underlying assumption behind Hadoop and, more generally, the need for distributed processing is that the data to be analyzed cannot be held in memory on a single machine. Today, this assumption needs to be re-evaluated. Although petabyte-scale datastores are increasingly common, it is unclear whether “typical” analytics tasks require more than a single high-end server. Additionally, we are seeing increased sophistication in analytics, e.g., machine learning, where we process smaller and more refined datasets. To address these trends, we propose “scaling down” Hadoop to run on multi-core, shared-memory machines. This paper presents a prototype runtime called Hone (“Hadoop One”) that is API compatible with Hadoop. With Hone, we can take an existing Hadoop application and run it efficiently on a single server. This allows us to take existing MapReduce algorithms and find the most suitable runtime environment for execution on datasets of varying sizes. For dataset sizes that fit into memory on a single machine, our experiments show that Hone is substantially faster than Hadoop running in pseudo-distributed mode. In some cases, Hone running on a single machine outperforms a 16-node Hadoop cluster.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hone: "Scaling Down" Hadoop on Shared-Memory Systems

The underlying assumption behind Hadoop and, more generally, the need for distributed processing is that the data to be analyzed cannot be held in memory on a single machine. Today, this assumption needs to be re-evaluated. Although petabyte-scale datastores are increasingly common, it is unclear whether “typical” analytics tasks require more than a single high-end server. Additionally, we are ...

متن کامل

Composable Incremental and Iterative Data-Parallel Computation with Naiad

We report on the design and implementation of Naiad, a set of declarative data-parallel language extensions and an associated runtime supporting efficient and composable incremental and iterative computation. This combination is enabled by a new computational model we call differential dataflow, in which incremental computation can be performed using a partial, rather than total, order on time....

متن کامل

Architectural techniques to extend multi-core performance scaling

Sohail, Hamza Bin PhD, Purdue University, May 2015. Architectural Techniques to Extend Multi-core Performance Scaling. Major Professor: T. N. Vijaykumar. Multi-cores have successfully delivered performance improvements over the past decade; however, they now face problems on two fronts: power and off-chip memory bandwidth. Dennard’s scaling is effectively coming to an end which has lead to a gr...

متن کامل

Memory Bottlenecks and Memory Contention in Multi-Core Monte Carlo Transport Codes

Current and next generation processor designs require exploiting on-chip, fine-grained parallelism to achieve a significant fraction of theoretical peak CPU speed. The success or failure of these designs will have a tremendous impact on the performance and scaling of a number of key reactor physics algorithms run on next-generation computer architectures. One key example is the Monte Carlo (MC)...

متن کامل

Performance Evaluation of Matrix Multiplication Using Mix Mode Optimization Techniques And Open MP For Multi-Core Processors

Matrix Multiplication is one of the most commonly used algorithm in many application areas like Sonar Systems, Relational Database Management System and other applications like algebra etc. Matrix Multiplication is quite difficult when it tends to infinity. In this paper we study and evaluate the execution time of simple matrix multiplication and optimized matrix multiplication with OpenMP on m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014